550 research outputs found
TAG : Type Auxiliary Guiding for Code Comment Generation
Existing leading code comment generation approaches with the
structure-to-sequence framework ignores the type information of the
interpretation of the code, e.g., operator, string, etc. However, introducing
the type information into the existing framework is non-trivial due to the
hierarchical dependence among the type information. In order to address the
issues above, we propose a Type Auxiliary Guiding encoder-decoder framework for
the code comment generation task which considers the source code as an N-ary
tree with type information associated with each node. Specifically, our
framework is featured with a Type-associated Encoder and a Type-restricted
Decoder which enables adaptive summarization of the source code. We further
propose a hierarchical reinforcement learning method to resolve the training
difficulties of our proposed framework. Extensive evaluations demonstrate the
state-of-the-art performance of our framework with both the auto-evaluated
metrics and case studies.Comment: ACL 2020, Accepte
Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference
Advancements in adapting deep convolution architectures for Spiking Neural
Networks (SNNs) have significantly enhanced image classification performance
and reduced computational burdens. However, the inability of
Multiplication-Free Inference (MFI) to harmonize with attention and transformer
mechanisms, which are critical to superior performance on high-resolution
vision tasks, imposes limitations on these gains. To address this, our research
explores a new pathway, drawing inspiration from the progress made in
Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP
architecture that uses batch normalization to retain MFI compatibility and
introduces a spiking patch encoding layer to reinforce local feature extraction
capabilities. As a result, we establish an efficient multi-stage spiking MLP
network that effectively blends global receptive fields with local feature
extraction for comprehensive spike-based computation. Without relying on
pre-training or sophisticated SNN training techniques, our network secures a
top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly
trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational
costs, model capacity, and simulation steps. An expanded version of our network
challenges the performance of the spiking VGG-16 network with a 71.64% top-1
accuracy, all while operating with a model capacity 2.1 times smaller. Our
findings accentuate the potential of our deep SNN architecture in seamlessly
integrating global and local learning abilities. Interestingly, the trained
receptive field in our network mirrors the activity patterns of cortical cells.Comment: 11 pages, 6 figure
VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs
Neural Radiance Fields (NeRF) has shown great success in novel view synthesis
due to its state-of-the-art quality and flexibility. However, NeRF requires
dense input views (tens to hundreds) and a long training time (hours to days)
for a single scene to generate high-fidelity images. Although using the voxel
grids to represent the radiance field can significantly accelerate the
optimization process, we observe that for sparse inputs, the voxel grids are
more prone to overfitting to the training views and will have holes and
floaters, which leads to artifacts. In this paper, we propose VGOS, an approach
for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10
views) to address these issues. To improve the performance of voxel-based
radiance field in sparse input scenarios, we propose two methods: (a) We
introduce an incremental voxel training strategy, which prevents overfitting by
suppressing the optimization of peripheral voxels in the early stage of
reconstruction. (b) We use several regularization techniques to smooth the
voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS
achieves state-of-the-art performance for sparse inputs with super-fast
convergence. Code will be available at https://github.com/SJoJoK/VGOS.Comment: IJCAI 2023 Accepted (Main Track
LiveVV: Human-Centered Live Volumetric Video Streaming System
Volumetric video has emerged as a prominent medium within the realm of
eXtended Reality (XR) with the advancements in computer graphics and depth
capture hardware. Users can fully immersive themselves in volumetric video with
the ability to switch their viewport in six degree-of-freedom (DOF), including
three rotational dimensions (yaw, pitch, roll) and three translational
dimensions (X, Y, Z). Different from traditional 2D videos that are composed of
pixel matrices, volumetric videos employ point clouds, meshes, or voxels to
represent a volumetric scene, resulting in significantly larger data sizes.
While previous works have successfully achieved volumetric video streaming in
video-on-demand scenarios, the live streaming of volumetric video remains an
unresolved challenge due to the limited network bandwidth and stringent latency
constraints. In this paper, we for the first time propose a holistic live
volumetric video streaming system, LiveVV, which achieves multi-view capture,
scene segmentation \& reuse, adaptive transmission, and rendering. LiveVV
contains multiple lightweight volumetric video capture modules that are capable
of being deployed without prior preparation. To reduce bandwidth consumption,
LiveVV processes static and dynamic volumetric content separately by reusing
static data with low disparity and decimating data with low visual saliency.
Besides, to deal with network fluctuation, LiveVV integrates a volumetric video
adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the
maximum quality of experience. Extensive real-world experiment shows that
LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps
with a latency of less than 350ms
- …